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SUMMARY 


VLSI  Implementation  of  Neuromorphic  Learning  Networks 
Contract  Number  F49620-90-C-0042,  DEF 
P.I.  -  Joshua  Aispector,  Bellcore 


Final  Report 


1.  Technical  Problem 

We  wish  to  extend  our  study  of  neural-style  learning  in  electronic  systems  to  a  usefully  large 
scale.  Our  long  term  goal  is  to  define  and  develop  an  electronic  learning  system  suitable  for 
solving  real-world  problems  using  learning  by  example. 

2.  Methodology 

The  study  of  electronic  implementation  issues  will  be  extended  to  large  scale  systems  using  a 
three  pronged  approach:  A)  Further  development  of  learning  algorithms  and  architectures 
suitable  for  modular  VLSI  implementation.  B)  Functional  simulation  of  large  scale  systems 
using  benchmark  test  problems.  C)  Design  and  fabrication  of  prototype  chips  suitable  for 
inclusion  in  and  testing  of  such  systems. 

3.  Technical  Results 

3.1  Theory 

We  have  shown  how  to  rigorously  derive  deterministic  systems  from  stochastic  ones  in  the 
Boltzmann  machine  framework  that  we  are  using  for  our  implementations.  We  have  further 
shown  how  to  search  for  new  learning  algorithms  suitable  for  VLSI  implementation  using  a 
genetic  algorithm  approach.  We  have  analyzed  the  effect  of  precision  constraints  such  as  we  find 
in  hardware  implementations  on  the  learning  and  generalization  abilities  of  neural  networks.  We 
have  studied  the  learning  behavior  of  neural  networks  under  conditions  where  they  can  or  cannot 
classify  perfectly.  We  have  defined  and  shown  how  to  use  a  measure  to  determine  when  either  a 
stochastic  or  deterministic  system  has  settled  and  will  use  this  measure  in  our  electronic  system. 
For  further  details,  see  Section  1  of  the  final  report. 

3.2  Simulation 

Our  results  show  that  the  Boltzmann  machine  learning  we  use  in  our  VLSI  implementation  gives 
approximately  the  same  performance  as  the  more  popular  back-propagation  algorithm  used  in 
most  simulations.  Furthermore,  both  algorithms  scale  up  to  large  size  similarly.  We  have 
started  a  study  of  perturbative  learning  for  implementable  feed-forward  neural  networks.  For 
further  details,  see  Section  2  of  the  final  report. 


3.3  Implementation 

We  have  designed,  fabricated,  and  tested  an  experimental  prototype  of  a  large  learning 
microchip  containing  32  neurons  and  496  bidirectional  synapses.  The  chip  settles  either 
stochastically  (using  a  novel  electronic  noise  generator)  or  deterministically  (using  variable  gain 
neuron  amplifiers).  Learning  experiments  on  a  single  chip  microsystem  show  results  similar  to 
what  we  obtained  in  simulation.  The  chip  is  capable  of  running  at  100,000  patterns  per  second 
(100  million  connections  per  second  per  chip)  and  of  being  cascaded  to  form  systems  of  larger 
size.  We  have  completed  the  design  and  fabrication  of  a  synapse-only  chip  to  enhance 
cascadability.  For  further  details,  see  Section  3  of  the  final  report. 

4.  Further  Research 

We  would  eventually  like  to  interface  our  simulator  to  a  multi-chip  learning  system  and  add 
other  functions  such  r«  mean-field  content-addiessable  memory. 

We  would  like  to  fully  realize  the  potential  of  our  learning  microchips  by  incorporating  the  chip 
set  into  a  large  multi-chip,  VME  based  learning  microsystem. 

We  would  like  to  find  a  suitable  means  of  learning  in  analog  VLSI  for  feed-forward  and  dynamic 
neural  networks. 

We  would  like  to  apply  these  techniques  to  challenging,  real-time  problems  such  as  image 
classification. 


5.  Special  Comment  on  Integrated  Circuit  Technology  for  Neural  Networks 

Significant  progress  in  the  implementation  of  large,  multi-chip,  electronic  learning  systems  is 
hampered  by  the  state  of  current  VLSI  technology.  An  integrated  circuit  technology  which  can 
create  small  (about  1  square  micron)  learning  synapses  is  highly  desirable.  Some  modification 
of  the  current  analog  floating  gate  technology  might  be  suitable.  Furthermore,  multi-chip 
systems  would  be  most  easily  achieved  if  a  suitable  wafer-scale  integration  technology  were 
available.  Neural  networks  are  a  natural  candidate  for  using  such  a  technology. 
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1.  Neural  Network  Theory 
1.1  Accomplishments 

Several  authors  have  recently  proposed  deterministic  learning  algorithms  as  approximations  to 
learning  in  stochastic  systems.  We  have  studied^ two  deterministic  learning  algorithms,  and 
showed  how  they  may  be  viewed  as  different  ways  of  performing  the  approximation  to  the  fully 
stochastic  system,  which  in  this  case  was  the  Boltzmaxm  machine.  We  focused  in  particular  on 
the  representation  of  probability  distributions  in  the  deterministic  systems  and  related  them  to 
the  true  distributions.  Specificadly,  if  one  takes  the  Boltzmann  machine  probability  distribution 
and  uses  a  saddle  point  approximation,  one  gets  the  usual  mean-held  equations.  However,  if  one 
uses  the  mean-held  approximation  that  the  correlations  factorize,  one  obtains  the  algorithm  of 
Pineda. 

From  the  point  of  view  of  hardware  applications  it  would  be  very  useful  to  devise  learning 
algorithms  which  require  only  a  limited  precision,  such  as  using  only  binary  synaptic  weights 
and  neural  states.  One  way  of  doing  this  is  to  take  existing  learning  algorithms  and  discretize  the 
weights  during  learning.  We  have  taken  an  alternative  approach  of  using  genetic  algorithms  to 
search  the  space  of  all  possible  algorithms.^^^  In  the  case  of  a  single  layer  perceptron  with  binary 
weights,  we  have  shown  that  we  get  a  well  known  algorithm  devised  by  other  means,  namely, 
the  directed  drift  algorithm  of  Venkatesh.  We  are  now  trying  to  extend  these  ideas  to  the  multi¬ 
layered  case,  about  which  much  less  is  known. 

We  have  studied  learning  and  generalization  in  single-layer  feedforward  networks,  whose 
weights  are  constrained  to  take  on  a  discrete  set  of  values.^^^  As  far  as  we  know  this  is  the  first 
analytic  study  of  the  effect  of  weight  precision  (important  for  hardware  implemenatations)  on 
the  learning  and  generalization  ability  of  neural  networks.  Our  analytic  results  are  obtained 
within  the  replica  approach,  which  is  verified  through  Monte  Carlo  simulations.  It  is  shown  that, 
depending  on  the  architecture  of  the  network  and  on  the  source  of  the  training  examples,  three 
qualitatively  different  behaviors  emerge.  This  distinction,  which  is  manifested  through  the 
dependence  of  the  training  and  generalization  errors  on  the  size  of  the  training  set,  suggests  a 
possible  way  to  determine  the  suitability  of  the  architecture  to  the  learning  task.  We  conjecture 
that  this  distinction  is  relevant  to  the  more  interesting  case  of  multi-layered  networks. 


We  have  calculated  the  training  and  generalization  errors  of  three  well  known  learning 
algorithms  using  methods  of  statistical  physics.^'*'  We  focus  in  particular  on  inconsistent 
algorithms  which  are  unable  to  perfectly  classify  the  training  examples,  and  show  that  the 
asymptotic  behavior  of  these  algorithms  is  different  from  the  case  of  consistent  algorithms.  Our 
results  are  in  agreement  with  bounds  derived  by  computational  learning  theorists.  We  also 
demonstrate  that  one  of  the  algorithms  studied  performs  almost  indistinguishably  from  the  Bayes 
learning  algorithm,  while  having  the  advantage  of  being  implementable  in  a  single-layer 
network.  This  last  point  is  important  if  one  is  to  systematically  evaluate  the  performance  of 
learning  systems,  and  compare  them  to  standard  statistical  approaches. 

In  feedback  neural  networks,  especially  for  static  pattern  learning,  a  reliable  method  of  settling  is 
required.  Simulated  annealing  has  been  used  but  it  is  often  difficult  to  determine  how  to  set  the 
annealing  schedule.  Often  the  specific  heat  is  used  as  a  measure  of  when  to  slow  down  the 
annealing  process,  but  this  is  difficult  to  measure.  We  have  proposed  another  measure, 
volatility, which  is  easy  to  measure  and  related  to  the  Edwards-Anderson  model  in  spin-glass 
physics.  We  have  been  studying  the  usefulness  of  this  measure  in  simulations  of  dynamics  in 
Boltzmann  and  mean-field  networks,  and  have  shown  how  to  use  it  to  speed  up  learning.  We 
have  established  a  theoretical  basis  for  the  volatility  measure  to  substitute  for  the  specific  heat  in 
annealing.  Simulations  have  verified  the  validity  of  this  measure  and  shown  how  to  use  it  to 
speed  up  annealing  and  learning.  This  quantity  is  far  easier  to  measure  than  specific  heat 
because  only  the  knowledge  of  neural  states  and  not  the  weights  are  needed.  This  seems  to  hold 
promise  for  being  an  easy  to  measure  way  of  controlling  the  noise  and  gain  in  our  neural 
network  chips. 

2.  Neural  Network  Simulation 
2.1  Accomplishments 

We  presented  a  paper^^^  at  the  Neural  Information  Processing  Systems  (NIPS)  conference  in 
November,  1990.  The  paper  shows,  by  simulation  of  benchmark  test  problems  such  as  NETtalk, 
that  network  learning  algorithms  of  the  type  we  are  implementing  (Boltzmann  and  mean-field) 
work  as  well  as  the  far  more  commonly  used  back-propagation  technique.  Since  some  form  of 
feedback  connections  are  required  so  that  the  teacher  signal  on  the  output  neurons  can  modify 
weights  during  supervised  learning,  we  argue  that  full  time  feedback,  as  opposed  to  the  part-time 
feedback  of  back-propagation,  is  more  plausible  to  investigate,  even  for  static  pattern  learning 
where  the  dynamics  of  recurrent  connections  are  not  utilized  fully.  Relaxation  methods  are 
needed  for  learning  static  patterns  with  full-time  feedback  connections.  Feedback  network 
learning  techniques  have  not  achieved  wide  popularity  because  of  the  still  greater  computational 
efficiency  of  back-propagation.  We  show  by  simulation  that  relaxation  networks  of  the  kind  we 
are  implementing  in  VLSI  are  capable  of  learning  large  problems  just  like  back-propagation 
networks.  The  availability  of  hardware  learning  should  give  a  boost  to  these  methods.  Our 
benchmark  problems  are  parity,  replication,  and  NETtalk. 

We  presented  a  paper^^^  at  the  Neural  Information  Processing  Systems  (NIPS)  conference  in 
December,  1992.  This  described  a  parallel,  stochastic  method  for  learning  in  feed-forward 
networks  without  doing  back-propagation  of  errors.  The  work  focused  on  a  perturbation 
technique  that  measures,  not  calculates,  the  gradient.  Since  the  technique  uses  the  actual 
network  as  a  measuring  device,  errors  in  modeling  neuron  activation  and  synaptic  weights  do  not 


cause  errors  in  gradient  descent.  Simulations  showed  that  the  method  learns  and  scales  well. 
We  used  the  benchmark  problems  of  parity,  replication,  contiguity,  and  hamming  coding  to 
check  scaling  properties.  It  appears  that  we  can  exploit  the  parallel  nature  in  an  implementation 
to  achieve  a  speedup  over  computer  simulation.  This  is  a  component  of  a  current  proposal  to 
ARPA  for  VLSI  implementation  of  learning  for  use  in  image  classifiers. 

3.  Neural  Network  Implementation 

3.1  Accomplishments 

We  have  designed,  fabricated,  and  performed  functional  and  learning  tests  on  an  experimental 
prototype  of  a  32  neuron  learning  microchip.^*^  This  160,000  transistor  chip  also  contains  496 
bi-directional  synapses  and  a  32  channel  uncorrelated  noise  generator.  We  have  measured  the 
transfer  functions  of  the  analog  neuron  and  the  analog  portions  of  the  synapse  and  demonstrated 
variable  gain.  We  have  verified  the  functionality  of  the  digital  portion  of  the  synapse.  We  have 
also  shown  that  the  noise  generator  works  and  demonstrated  its  effect  on  the  neuron  transfer 
function.  We  have  built  a  set  of  test  boards  for  performing  learning  using  an  experimental 
prototype  single  chip.  We  have  integrated  the  chip,  boards,  data  generators  and  analyzers,  and 
an  X-windows  based  interface  into  a  learning  system  based  on  the  chip. 

More  importantly,  we  have  demonstrated  that  the  chip  learns.  This  work  follows  up  on  the 
previous  year’s  simulation  paper  by  performing  Boltzmann  and  mean-field  experiments  in  actual 
learning  hardware  rather  than  just  simulations.  The  results  obtained  are  similar  to  those  obtained 
in  software.  Measurements  show  that  the  potential  learning  speed  of  the  hardware  is  100,000 
patterns  per  second  roughly  independent  of  the  problem  size.  If  all  the  synapses  on  the  chip 
were  utilized,  this  would  imply  a  learning  speed  of  100  million  connection  updates  per  second 
(CUPS)  per  chip.  This  is  roughly  10,000  times  faster  than  Sparc  2  simulations  of  Boltzmann 
learning. 

We  have  completed  design  and  fabrication  of  an  experimental  prototype  of  a  1024  bi-directional 
synapse  chip  to  enhance  cascadability  of  the  above  neuron-containing  chip.^^^  This  chip  has 
been  tested  for  functionality  and  appears  to  work  well.  The  two  chips  together  can  form  the 
building  blocks  for  a  much  larger  neural  system. 

We  presented  a  paper^**^  at  the  1992  IEEE  Neural  Networks  for  Signal  Processing  Workshop  at 
Elsinore,  Denmark,  September,  1992.  This  showed  how  our  Boltzmann  and  mean-field 
prototype  chip  can  be  used  for  content  addressable  memory  with  a  capacity  far  larger  than 
ordinary  Hopfield  memory  by  using  mean-field  settling  and  hidden  units.  Results  show  good 
agreement  between  simulations  and  the  actual  chip.  A  possible  use  of  this  technique  would  be 
for  vector  quantization  or  other  coding. 

3.2  Other  Deliverables 

Video  available  (delivered  to  B.  Yoon,  Seattle,  July  11,  1991);  Joshua  Alspector  and  Anthony 
Jayakumar,  "Bellcore  Neural  Learning  System  -  The  Video";  A  learning  neural  microchip  which 
settles  using  both  electronic  noise  and  variable  gain  neurons  is  described.  The  functions  of  the 
various  components  are  demonstrated  on  oscilloscopes.  An  integrated  software- hardware 
system  which  can  learn  by  example  is  demonstrated.  An  example  learning  task  performed  by 
the  chip  is  displayed  in  an  X-windows  based  software  system.  Videotaped  on  location  in  Room 
2E-377,  Bellcore,  Morristown,  NJ,  June  24-25, 1991.  Copyright  Bellcore  1991. 


Microchip  available  (delivered  to  B.  Yoon.  Seattle.  July  1 1,  1991) 

Assorted  view-graphs  including  both  the  neuron-containing  and  synapse-only  learning  chips  are 
available. 
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